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Abstract 

Two genetic loci, one in the cytochrome P450 1A1 (CYP1A1) and 1A2 {CYP1A2) gene region (rs2472297) and one near the 
aryl-hydrocarbon receptor (AHR) gene (rs6968865), have been associated with habitual caffeine consumption. We sought to 
establish whether a more refined and comprehensive assessment of caffeine consumption would provide stronger evidence 
of association, and whether a combined allelic score comprising these two variants would further strengthen the 
association. We used data from between 4,460 and 7,520 women in the Avon Longitudinal Study of Parents and Children, a 
longitudinal birth cohort based in the United Kingdom. Self-report data on coffee, tea and cola consumption (including 
consumption of decaffeinated drinks) were available at multiple time points. Both genotypes were individually associated 
with total caffeine consumption, and with coffee and tea consumption. There was no association with cola consumption, 
possibly due to low levels of consumption in this sample. There was also no association with measures of decaffeinated 
drink consumption, indicating that the observed association is most likely mediated via caffeine. The association was 
strengthened when a combined allelic score was used, accounting for up to 1.28% of phenotypic variance. This was not 
associated with potential confounders of observational association. A combined allelic score accounts for sufficient 
phenotypic variance in caffeine consumption that this may be useful in Mendelian randomization studies. Future studies 
may therefore be able to use this combined allelic score to explore causal effects of habitual caffeine consumption on 
health outcomes. 
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Introduction 

Caffeine is one of the most widely-consumed psychoactive 
substances world-wide, and while coffee and tea consumption 
dominate, it is also present in some soft drinks [1]. There is also 
considerable inter-individual variability in preference for caffeine 
[2], in part due to genetic factors. Twin studies have consistently 
indicated substantial (~50%) heritability of caffeine consumption 
(typically assessed as coffee consumption) [3-9]. Recently, a 
number of genome-wide association studies have identified 
variants robustly associated with caffeine consumption (again, 
typically assessed as coffee consumption) [10-12]. In particular, 
two loci, one in the cytochrome P450 1A1 (CYP1A1) and 1A2 
(CYP1A2) gene region on chromosome 15 and one near the aryl- 
hydrocarbon receptor (AHR) gene on chromosome 7, have been 
found to be associated with habitual caffeine consumption across a 
number of studies [10-13]. Two single nucleotide polymorphisms, 



rs2472297 in between CYP1A1 and CYP1A2, and rs6968865 
51 kb upstream of AHR, provide the strongest signals, each with 
an effect equivalent to an increased consumption of ~0.2 cups per 
day per risk (T) allele. The genes are biologically plausible 
candidates for caffeine consumption phenotypes as they both 
encode members of the same biochemical pathway. AHR is 
known to induce CYP1A1 and CYP1A2 by binding to the DNA in 
the region between these two genes [12], and low CYP1A2 activity 
has been associated with higher caffeine toxicity [14]. 

A limitation of studies to date is that they have typically used a 
single measure of caffeine consumption (e.g., coffee). One study 
[1 1] measured total caffeine consumption, but coffee contributed 
towards 80% of this, and data on other sources of caffeine were 
not reported separately. While coffee represents the major source 
of caffeine consumption in some countries, other sources of 
caffeine can be important. We have previously shown that 
phenotypic assessments which more accurately capture the 
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exposure of interest can improve the precision of genetic 
association studies [15], particularly when the exposure (e.g., 
caffeine consumption) is strongly influenced by behaviour or 
behavioural choices (e.g., preference for coffee or tea). We 
therefore sought to establish whether using a more comprehensive 
phenotypic assessment of caffeine consumption, using measures of 
coffee, tea and cola consumption, would provide stronger evidence 
of association with rs2472297 and rs6968865. We were also 
interested in whether a combined allelic score comprising these 
two variants would further strengthen the association with caffeine 
consumption. 

Materials and Methods 

Study Sample 

The Avon Longitudinal Study of Parents and Children 
(ALSPAC) sample is a longitudinal birth cohort that comprises 
20,248 pregnancies. The mothers of 14,541 (71.8%) pregnancies 
were recruited antenatally during 1990-92 (Phase I). Post-natal 
recruitment to the 'Focus@7' clinical assessment at the age of ~7 
years recruited a further 456 children from 452 (2.2% of eligible) 
pregnancies (Phase II). Recruitment during ages 8-18 years (Phase 
III) added a further 257 children from 254 (1.2% of eligible) 
pregnancies, giving an overall total of 15,247 (75.3% of eligible) 
enrolled pregnancies; from these pregnancies there were 14,775 
live-born children of which 14,701 were alive at one year of age. 
The phases of enrolment are described in more detail in the cohort 
profile paper [16]. The ALSPAC website contains details of all the 
data that are available through a fully searchable data dictio- 
nary: http:/ / www.bristol.ac.uk/ alspac/ researchers/ data-access/ 
data-dictionary/. Ethics approval for the study was obtained from 
the ALSPAC Ethics and Law Committee and the Local Research 
Ethics Committees (Bristol and Weston Health Authority, South- 
mead Health Authority, Frenchay Health Authority). 

Measures of Caffeine Consumption 

Data on coffee and tea consumption were collected via self- 
report during pregnancy at 8, 18 and 32 weeks gestation and 2, 47, 
85, 97 and 145 months after delivery. Participants were asked to 
report "current daily coffee and tea drinking", as number of 
drinks, separately for weekdays and weekends. Similar questions 
were asked for cola consumption in drinks per week. For cola 
consumption, questions were open format at 8, 18, and 32 weeks 
gestation, and 2 months after delivery, and closed format at later 
time points ("never or rarely", "once in 2 weeks", "1 to 3 times a 
week", "4 to 7 times a week", "once a day or more"). Closed 
format responses were recoded to 0, 0.5, 2, 5.5 and 7 drinks per 
week, and cola consumption values further recoded to reflect daily 
consumption. Outlying daily consumption values (>10 drinks for 
coffee, >15 drinks for tea and >21 drinks for cola) were coded as 
missing data. Similar questions were also asked for decaffeinated 
coffee, tea and cola consumption at the same time points, and 
coded in the same way. In order to obtain a measure of total daily 
caffeine consumption, number of cups of tea and coffee were 
summed with drinks per day of cola, weighted with respect to 
approximate caffeine content (coffee 75; tea 40; cola 34.5) [17,18]. 
The distribution of total caffeine consumption, and coffee and tea 
consumption, is shown in Figures SI -S3. 

Genotyping 

Genotypes at the CYP1A1 (rs2472297) and AHR (rs6968865) 
loci were available from GWAS genotyping data. A total of 10,015 
ALSPAC mothers were genotyped on the Illumina 660K quad 
chip at the Centre National de Genotypage, Paris, resulting in 



557,124 directly genotyped SNPs before quality control. Geno- 
types were called with Illumina GenomeStudio and PLINK 
(vl.07) was used to carry out quality control steps. 

Individuals were excluded from further analysis on the basis of 
having incorrect sex assignments; minimal or excessive hetero- 
zygosity, disproportionate levels of individual missingness (>5%); 
evidence of cryptic relatedness (>10% identical by descent) and 
being of non-European ancestry (as detected by a multidimen- 
sional scaling analysis seeded with HapMap 2 individuals). SNPs 
with a minor allele frequency of <1% and call rate of <95% 
were removed. Furthermore, only SNPs which passed an exact 
test of Hardy- Weinberg equilibrium (P>5xl0 6 ) were consid- 
ered for further use. Population stratification was assessed by 
means of multidimensional scaling of genome-wide identity by 
state (IBS) pairwise distances using the four (YOR, CEU, CHB, 
JPT) HapMap populations as a reference. Cryptic relatedness was 
assessed using estimates of the proportion of SNPs expected to be 
identical by descent given estimates of IBS. Subject with a 
relatedness of 0.1 or higher were excluded. Genotypes were 
imputed with Markov Chain Haplotyping software (MaCH 
1.0.16) (45) using CEPH individuals from phase 2 of the 
HapMap project as a reference set (release 22). SNP rs2472297 
was direcdy genotyped, had a MAF of 0.27, HWE P-value of 0.1 
and 0.02% missingness before imputation. SNP rs6968865 was 
imputed with an imputation quality of 0.96, and MAF of 0.39. 
After imputation genotypes were available for 8,340 subjects. The 
frequencies of the T allele were 0.27 in rs2472297 and 0.61 in 
rs6968865. 

Statistical Analysis 

Data on total caffeine consumption, and consumption of tea, 
coffee, cola and their decaffeinated counterparts, were analysed in 
a linear regression on number of T alleles in a univariate analysis 
of each SNP. Linear regression was carried out using the lm 
package in R (v. 2.14.0). Best-guess genotypes were used for 
analysis. 

To obtain joint effects to take into account genotypes at both 
SNPs simultaneously, following Sulem and colleagues [12], the 
number of T alleles were summed across SNPs to derive a 
combined SNP score of the total number of T alleles per subject 
which was then used in a regression with phenotype data. For 
rs6968865 the T allele is the major allele, so that the SNP score 
contained one minor allele and one major (i.e., reference) allele. 
Weighting alleles using effect sizes obtained from Sulem and 
colleagues [12] (rs2472297 by 0.31, rs6968865 by 0.26) provided 
similar results and we present the results for the unweighted SNP 
score for simplicity. 

We examined within-locus non-additivity by testing the 
significance of a second heterozygote term, and between-locus 
non-additivity by testing for a joint effect beyond the sum of the 
effects of both SNPs individually. Our results indicated that these 
SNPs act additively, and their effects are independent (although 
we cannot rule out more complicated interactions between these 
SNPs in the presence of other factors). 

Data used for this submission will be made available on request 
to the ALSPAC executive committee (alspac-exec@bristol.ac.uk). 
The ALSPAC data management plan (available here: http:// 
www.bristol.ac.uk/alspac/researchers/data-access/) describes in 
detail the policy regarding data sharing, which is through a 
system of managed open access. 
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Results 

Characteristics of Participants 

The total sample available for analysis comprised between 4,460 
and 7,520 women (see Figure 1 for a summary of how this sample 
was arrived at). Levels of missingness were low unless questions on 
caffeine consumption were not included in one or more versions of 
the questionnaire at that time point. More information on 
ALSPAC mothers' response rates has been published previously 
[16]. 

Consumption of coffee tended to increase roughly linearly 
across time points (means 1.18 to 2.30 drinks per day). 
Consumption of tea (means 2.73 to 3.18 drinks per day) and cola 
(means 0.60 to 2.31 drinks per week) varied across time points, but 
with no clear pattern of change. As a result, total daily caffeine 
consumption tended to increase across time points (means 
206.8 mg to 306.1 mg). These data are shown in Tables 1-4. In 
general, cola consumption was considerably less than tea and 
coffee consumption, reflecting approximately 4% to 11% of total 
caffeine consumption in drinks per day. 

Caffeine Consumption 

Across all time points, total caffeine consumption was associated 
with both CYP1A1 ((3s =8.7 to 21.4, Ps =1.59xl0" 3 to 
3.33xl0" 10 ) and AHR ((3s =4.0 to 14.6, Ps =1.15x10"' to 
3.34x10 6 ) genotypes (Table 1). Similarly, total caffeine con- 
sumption was also associated with the combined SNP score, and 
the statistical evidence for this association considerably stronger 
(Ps =5.9 to 17.1, Ps = 1.15xl0" 3 to 3.74xl0" 1+ ). 

In general, the proportion of phenotypic variance explained 
across all time points was small, as would be expected for the 
association of common variants with complex behavioural 
phenotypes. For CYP1A1, the proportion of phenotypic variance 
explained ranged from 0.15% to 0.88%, while for AHR it ranged 
from 0.04% to 0.48%. However, the combined SNP score 
accounted for a somewhat higher proportion of phenotypic 
variance on average, ranging from 0.16% to 1.28%. 

Estimates of the proportion of phenotypic variance obtained 
using GCTA [19] for the two SNPs in the 2-SNP score were 
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Figure 1. Study Participant Flow Diagram. Due to study attrition, 
data obtained when the cohort first started have a higher number of 
responses than variables collected later. Thus the number of 
participants on whom data are available is given as a range. 
doi:10.1371/journal.pone.0103448.g001 
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broadly similar to those obtained using linear regression (0.10% to 
1.10% vs 0.16% to 1.28%). GCTA analysis for the remaining 
directly-genotyped SNPs available accounted for additional 
phenotypic variance, although these estimates may be unreliable 
due to relatively small sample size (see Table SI). 

Stratified analyses further indicated that these associations were 
present for consumption of coffee (combined SNP score: Ps = 
0.047 to 0. 120, Ps = 2.34x 10" 2 to 5.46x 10" 5 ) and tea (combined 
SNP score: Ps = 0.076 to 0.209, Ps =2.58xl0" 2 to 1.23 xlO" 8 ), 
but not cola (combined SNP score: Ps = -0.046 to 0.032, Ps 
= 9.15x10"' to 5.51 xlO" 2 ) (Tables 2-4). Interestingly, associa- 
tions for tea consumption were generally stronger than for coffee 
consumption. Removing participants who reported zero con- 
sumption of coffee, tea and/or cola did not alter these results 
substantially. 

There was no evidence that either AHR or CYP1A1 genotypes, 
or the combined SNP score, was associated with consumption of 
decaffeinated coffee, tea or cola (see Tables S2-S4), indicating that 
the associations observed are specific to caffeinated drinks. Again, 
removing participants who reported zero consumption of coffee, 
tea and/or cola did not alter these results substantially. We also 
did not observe any association with measures of aversion to 
coffee, tea or cola taken during pregnancy (data available on 
request). 

Potential Confounders 

Next we assessed the association of the combined SNP score 
with potential confounders (year of birth, educational attainment, 
measures of socioeconomic position, alcohol use, tobacco use). 
These indicated no evidence of association (Table 5), suggesting 
that the combined SNP score may be a useful instrumental 
variable in Mendelian randomization analyses [20,21]. This is in 
contrast with the association of total caffeine consumption with 
the same potential confounders, which shows very strong 
evidence of association at multiple time points (Table 6). A full 
description of these variables is provided in the ALSPAC cohort 
profile [16]. 

Discussion 

Our results confirm that two SNPs in AHR and CYP1A1 are 
associated with caffeine consumption, and extend previous 
findings in two important ways. First, our results are the first to 
show association in a sample where caffeine consumption via 
caffeinated beverages other than coffee is common. Moreover, we 
show that a combined caffeine consumption phenotype derived 
from measures of consumption of three caffeinated beverages 
(coffee, tea and cola) provides a stronger signal than any one of 
these measures separately. Second, our results also confirm that 
these results are due to caffeine consumption, rather than some 
other common characteristic of caffeinated beverages. By using 
measures of consumption of decaffeinated drinks as negative 
controls we show no evidence of association with either AHR or 
CYP1A1. While our results hold for both SNPs individually, our 
strongest results are obtained when both SNPs are combined to 
create a 2-SNP genetic risk score. 

Observationally, caffeine (or, more commonly, coffee) con- 
sumption has been shown to be associated with a number of health 
outcomes [22]. Evidence from longitudinal studies suggests that 
long-term coffee consumption may in fact be protective against 
cardiovascular disease [22,23] and lower the risk of all-cause 
mortality [24]. Coffee consumption also shows an inverse 
association with diabetes, although this may be due to antioxidant 
compounds within coffee rather than caffeine itself [23]. Obser- 
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vational studies suggest that coffee consumption may have further 
beneficial health effects, including reducing risk of several 
cancers, such as endometrial, liver and prostate cancer [25-27] 
and protecting against depression, attention deficit hyperactivity 
disorder and Alzheimer disease [28-30]. Conversely, it is 
recommended that caffeine consumption is restricted during 
pregnancy due to its association with adverse pregnancy 
outcomes such as intrauterine growth retardation and miscar- 
riage [31,32]. Observational studies also suggest that caffeine 
consumption may be detrimental to bone health, leading to 
increased fracture risk [33]. However, these studies all suffer from 
the usual problems of residual confounding and reverse causality 
which limit the causal inferences that can be drawn from 
observational data. 

Mendelian randomization (MR) offers one approach to better 
understanding the causal nature of the observed associations 
between caffeine consumption and health outcomes. Genetics 
variants are randomly assorted during gamete formation and 
conception, and therefore should be unrelated to other lifestyle 
factors associated with coffee consumption which may confound 
observational associations [34] . Health outcomes cannot affect the 
genes that an individual has, so we know that associations from 
MR analyses are not due to reverse causality [34]. This may be 
particularly important in observational studies of the effects of 
caffeine as individuals may alter levels of caffeine consumption in 
response to ill health. In addition, caffeine consumption is difficult 
to measure accurately as it is usually obtained from food frequency 
questionnaires [35], so observational estimates may be biased by 
random or non-random measurement error. In contrast, MR can 
provide accurate estimates of the magnitude of lifelong exposure to 
a risk factor [36] . 

Critically, we have shown that the two SNPs in AHR and 
CYP1A1, and our 2-SNP genetic risk score, are not associated 
with a range of potential confounders that may give rise to 
spurious associations in studies of health-outcomes putatively 
related to caffeine consumption. This, together with the clear 
evidence of association with caffeine consumption, indicates that 
the 2-SNP genetic risk score could be used as an instrumental 
variable in MR analyses. The greater variance explained by the 
combined score would increase statistical power and reduce the 
sample size required to detect associations with health 
outcomes, compared to using either SNP individually. The 
risk score explains up to 1.3% of the variance in caffeine 
consumption, which although small in absolute terms is 
relatively large by the standards of common genetic variants. 
This is comparable to the variance explained in body mass 
index (BMI) by variants in the FTO gene, and in cigarette 
consumption by variants in the CHRNA5-A3-B4 gene cluster 
[15,37], which have been used in MR studies of the causal 
effects of BMI and smoking on health outcomes [38-41]. The 2- 
SNP score for caffeine consumption may therefore be a suitable 
instrument to explore the causal effects of caffeine consumption 
on a range of health outcomes. 

There are some limitations to this study that should be 
considered when interpreting our results. First, caffeine 
consumption was measured using a food frequency question- 
naire, and these may have modest reliability and validity [35]. 
We were also only able to capture tea, coffee and cola drinks as 
sources of dietary caffeine, and not other sources (e.g., 
chocolate). However, tea, coffee and soft drinks (including cola) 
together account for ~90% of caffeine consumption in similar 
populations, and the levels of consumption we observed are 
similar to those observed in other studies [32]. While more 
detailed assessments of caffeine consumption are possible, these 



are difficult to obtain on the scale necessary for genetic 
association studies. Future studies could obtain more detailed 
phenotypic information on selected, genetically-informative 
individuals [42] . Second, levels of cola consumption were low 
in this sample, so that this, together with the relatively low levels 
of caffeine in cola drinks, may account for the lack of association 
observed. It is also possible that participants were responding to 
questions about "cola" consumption at least in part as questions 
about all soda consumption. To better understand whether this 
lack of association is genuine will require the study of 
populations where levels of cola consumption are higher. Third, 
our sample was restricted to women only. Rates of caffeine 
consumption may differ between men and women, although 
there are no clear reasons to expect that the pattern of results we 
observed would differ in males. While patterns of consumption 
during pregnancy may not be typical, our data extend to ~ 1 2 
years post-pregnancy. It is likely that the women in our sample 
reverted to pre-pregnancy patterns of caffeine consumption over 
time. Fourth, we only included 2 SNPs in our analysis. These 
were chosen on the basis of being those for which there is the 
clearest evidence from recent GWAS of caffeine consumption. 
Future studies may extend our 2-SNP score by including further 
variants. Fifth, although we are optimistic that these genotypes, 
and the 2-SNP score, can be used as instrumental variables in 
MR analyses, potential pleiotropic effects will need to be 
considered. Metabolic enzyme genotypes typically relate to 
several metabolic differences with may give rise to associations 
with health outcomes. In principle, this can be tested by 
examining the association of genotype with health outcome 
separately in those who do and do not consume caffeinated 
drinks [43] - the genotype should not be associated with the 
outcome in the latter group if the association is mediated via 
caffeine consumption (although this can give rise to collider bias 
[44]). Finally, participants of non-European ancestry were 
excluded during preparation of GWAS data, given that 
differences in ancestry can bias genetic association studies. 
Therefore, genotypes were only available for participants of 
European ancestry. However, >95% of ALSPAC participants 
are of European ancestry, so we think it unlikely that this 
influenced our results. 

In conclusion, our data confirm the association of AHR and 
CYP1A1 genotypes with caffeine consumption, and extend 
previous work by showing that this association holds for tea 
consumption as well as coffee consumption. Moreover, no 
association is observed for decaffeinated tea or coffee consump- 
tion. This strengthens the argument that the association is 
mediated via caffeine consumption, although it remains possible 
that other compounds present in both tea and coffee mediate this 
association. Future work, perhaps selecting participants on the 
basis oiAHR and CYP1A1 genotype, could explore this possibility 
through the administration of caffeine in a laboratory setting. 
Finally, the relatively large proportion of variance in caffeine 
consumption accounted for by the combined SNP score, and the 
lack of association of this with potential confounders, means that it 
could be used in Mendelian randomization studies to explore the 
causal effects of habitual caffeine consumption on health-related 
outcomes. 

Supporting Information 

Figure SI Distribution of total caffeine consumption 

( m g)- 

(TIF) 
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Figure S2 Distribution of total coffee consumption 
(cups per day). 

(TIF) 

Figure S3 Distribution of total tea consumption (cups 
per day). 

(TIF) 

Table SI Variance in total caffeine consumption ex- 
plained using linear regression and GCTA. 

(DOCX) 

Table S2 Association of CYP1A1 rs2472297, AHR 
rs6968865 and combined genetic score with decaffeinat- 
ed coffee consumption. 

(DOCX) 

Table S3 Association of CYP1A1 rs2472297, AHR 
rs6968865 and combined genetic score with decaffeinat- 
ed tea consumption. 

(DOCX) 
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